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Information plays an important role in our understanding of the physical world. We hence pro- 
pose an entropic measure of information for any physical theory that admits systems, states and 
measurements. In the quantum and classical world, our measure reduces to the von Neumann and 
Shannon entropy respectively. It can even be used in a quantum or classical setting where we are 
only allowed to perform a limited set of operations. In a world that admits superstrong correlations 
in the form of non-local boxes, our measure can be used to analyze protocols such as superstrong 
random access encodings and the violation of 'information causality'. However, we also show that 
in such a world no entropic measure can exhibit all properties we commonly accept in a quantum 
setting. For example, there exists no 'reasonable' measure of conditional entropy that is subadditive. 
Finally, we prove a coding theorem for some theories that is analogous to the quantum and classical 
setting, providing us with an appealing operational interpretation. 



I. INTRODUCTION 

Understanding information in classical and quantum 
physics has helped us shed light on the fundamental na- 
ture of these theories. Indeed, it has even been sug- 
gested that quantum theory could be more naturally 
formulated in terms of its information-theoretic proper- 
ties [3 [TOl US] . Yet, we have barely scratched the sur- 
face of understanding the role of information in the natu- 
ral world. To gain a deeper understanding of information 
in physical systems, and to help explain why nature is 
quantum, it is sometimes instructive to take a step back 
and view quantum mechanics in a much broader context 
of possible physical theories. Many examples are known 
that indicate that if our world were only slightly differ- 
ent, our ability to perform information processing tasks 
could change dramatically [21 [HI HS1 HH [331 [23 El] 

However, before we can hope to really investigate gen- 
eral theories from the perspective of information process- 
ing, we first need to find a way to quantify information. 
In a quantum and classical world, this can be done us- 
ing the von Neumann and Shannon entropy respectively, 
which capture our notions of information and uncertainty 
in an intuitive way. These quantities have countless prac- 
tical applications, and have played an important role in 
understanding the power of such theories with respect to 
information processing. 

Here, we propose a measure of information that ap- 
plies to any physical theory [U] which admits the min- 
imal notions of finite physical systems, their states, and 
the probabilistic outcomes of measurements performed 
on them. Many such theories have been suggested, 
each of which shares some aspects with quantum theory, 
yet have important differences. For example, we might 
consider quantum mechanics itself with a limited set 
of allowed measurements, quantum mechanics in a real 
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Hilbert space, generalized probabilistic theories [U [3], 
general C*-algebraic theories [TU], box world [35] (a the- 
ory admitting all non-signalling correlations [271 H2J > pre- 
viously called Generalized Non-Signalling Theory [3J), 
classical theories with an epistemic restriction [33] or the- 
ories derived by relaxing uncertainty relations [35]. 



A. A measure of information 

1. Entropy 

We propose an entropic measure of information H that 
can be used in any such theory in Section |IV A[ We will 
show that our measure reduces to the von Neumann and 
Shannon entropy in the quantum and classical setting 
respectively. In addition, we show that it shares many 
of their appealing intuitive properties. For example, we 
show that the quantity is always positive and bounded 
for the finite systems we consider. This provides us with 
a notion that each system has some maximum amount of 
information that it can contain. Furthermore, we might 
expect that mixing increases entropy. I.e. that the en- 
tropy of a probabilistic mixture of states cannot be less 
than the average entropy of its components. This is in- 
deed the case for our entropic quantity. Another prop- 
erty that is desirable of a useful measure of information is 
that it should take on a similar value for states which are 
'close', in the sense that there exists no way to tell them 
apart very well. This is the case for the von Neumann and 
Shannon entropy, and also for our general entropic quan- 
tity, given one extra minor assumption. Finally, when 
considering two different systems A and B, one may con- 
sider how the entropy of the joint system AB relates to 
the entropy of the individual systems. It is intuitive that 
our uncertainty about the entire system AB should not 
exceed the sum of our uncertainties about A and B indi- 
vidually. This property is known as subadditivity and is 
obeyed by our measure of entropy given one additional 
reasonable assumption on the physical theory. Our en- 



2 



tropic quantity thus behaves in very intuitive ways. Yet, 
we will see that there exist physical theories for which it is 
not strongly subadditive, unlike in quantum mechanics. 

Of course, there are multiple ways to quantify infor- 
mation and we discuss our choice by examining some 
alternatives and possible extensions such as notions of 
accessible information, relative entropy as well as Renyi 
entropic quantities in Sections |IVC| and |IVD| 



2. Conditional entropy and mutual information 

Clearly, it is also desirable to capture our uncertainty 
about some system A conditioned on the fact that we 
have access to another system B. This is captured by 
the conditional entropy, for which we provide two def- 



initions in Section IV B which are both interesting and 
useful in their own right. Based on such definitions we 
also define notions of mutual information which allow us 
to quantify the amount of information that two systems 
hold about each other. Our first definition of conditional 
entropy is analogous to the quantum setting, and indeed 
reduces to the conditional von Neumann entropy in a 
quantum world. This is an appealing feature, and opens 
the possibility of interesting operational interpretations 
of this quantity as in a quantum setting [20|. [21] . Yet, we 
will see that there exists a theory (called box world) for 
which not only the subadditivity of the conditional en- 
tropy is violated, but also where conditioning increases 
entropy. Intuitively, we would not expect to grow more 
uncertain when given additional information, which we 
could always choose to ignore. 

We will hence also introduce a second definition of con- 
ditional entropy, which does not reduce to the von Neu- 
mann entropy in the quantum world. However, it has the 
advantage that in any theory conditioning reduces our 
uncertainty, as we would intuitively expect when taking 
an operational viewpoint. Nevertheless, even our second 
definition of the conditional entropy violates subadditiv- 
ity. 



3. Possible properties of the conditional entropy 



B. Examples 

To give some intuition about how our entropies can be 
used outside of quantum theory, we examine a very sim- 
ple example in box world in Section [Vj which illustrates 
all the peculiar properties our entropies can have. This is 
based on a task in which Alice must produce an encoding 
of a string x, such that Bob can retrieve any bit of his 
choosing with some probability [35] (known as a random 
access encoding). It is known that superstrong random 
access codes exist in box world [35] , leading to a violation 
of the quantum bound for such encodings [2"5] . 

A similar game was used in [26] to argue that one 
of the defining characteristics that sets the quantum 
world apart from other possibilities (and particularly box 
world) is that communication of m classical bits causes 
information gain of at most m bits, a principle called 
'information causality'. In Section VII we examine this 
statement using our entropic quantity. We notice that 
it is the failure of subadditivity of conditional entropy in 
box world that leads to a violation of the inequality quan- 
tifying 'information causality' given in [26] . We conclude 
our examples by discussing the definition of 'information 
causality' more generally. 



C. A coding theorem 

In the classical, as well as the quantum setting, the 
Shannon and von Neumann entropies have appealing op- 
erational interpretations as they capture our ability to 
compress information. In Section [VIII| we show that the 
quantity H(-) has a similar interpretation for some phys- 
ical theories. When defining entropy we have chosen to 
restrict ourselves to a minimal set of assumptions, only 
assuming that a theory would have some notion of states 
and measurements. To consider compressing a state or 
indeed decoding it again, however, we need to know a 
little more about our theory. In particular, we first have 
to define a notion of 'size' for any compression proce- 
dure to make sense. Second, we need to consider what 
kind of encoding and decoding operations we are allowed 
to perform. Given these ideas, and several additional 
assumptions on our physical theory, we prove a simple 
coding theorem. 



Naturally, one might ask whether the fact that both 
our definitions of the conditional entropy violate subad- 
ditivity is simply a shortcoming of our definitions. In 
Section |VI| we therefore examine what properties any 
'reasonable' measure of conditional entropy can have in 
principle. By reasonable here we mean that if given ac- 
cess to a system B we have no uncertainty about some 
classical information A, then the quantity is '0', and oth- 
erwise it is positive (or even non-zero). We show that 
under this simple assumption there exists no measure of 
conditional entropy in box world that is subadditive or 
obeys a chain rule. 



D. Outline 

In Section [TTJ we introduce a framework for describ- 
ing states, measurements and transformations in general 
physical theories, followed in Section |III| by some exam- 
ples. In Section [TV] we then define our entropic measures 
of information that can be applied in any theory. Exam- 
ples of of how these entropies can be applied in box world 
can be found in SectionfV] In Scction lVll we examine what 
properties we can hope to expect from a conditional en- 



tropy in box world. Section VII investigates the notion of 
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'information causality' in our framework and finally we 
show a coding theorem for many theories in Section |VIII| 
We conclude with many open questions in Section [DC] 

II. AN OPERATIONAL FRAMEWORK FOR 
PHYSICAL THEORIES. 

We now present a simple framework, based on minimal 
operational notions (such as systems, states, measure- 
ments and probabilities) , that encompasses both classical 
and quantum physics, as well as more novel possibilities 
(such as 'box world') [TJ [3J [TTJ [TB] . Our approach is sim- 
ilar to that in [T], however it is slightly more general as 
it does not assume that all measurements that are math- 
ematically well-defined are physically implcmentable, or 
that joint systems can be characterised by local measure- 
ments. 

A. Single systems and states. 

Firstly, we will assume that there is a notion of dis- 
crete physical systems. With each system A we asso- 
ciate a set of allowed states Sa, which may differ for 
each system. We furthermore assume that we can pre- 
pare arbitrary mixtures of states (for example by tossing 
a biased coin, and preparing a state dependent on the 
outcome), and therefore take Sa to be a convex set, with 
Smix = psi + (1 — p)s2 denoting the state that is the 
mixture of si with probability p and s 2 with probability 
1 — p. To characterize when two states are the same, or 
close to each other, we first need to introduce the notion 
of measurements. 

B. Measurements 

Secondly, we thus assume that on each system A, we 
can perform a certain set of allowed measurements £a = 
{e}. If the system A is clear from context, we will omit 
the subscripts and simply write £ and S. 

With each measurement e we associate a set of out- 
comes lZ e , which for simplicity of exposition we take to 
be finite. When a particular measurement is performed 
on a system, the probability of each outcome should be 
determined by its state. We therefore associate each pos- 
sible outcome r G lZ e with a functional e r : S — > [0, 1], 
such that e r (S) is the probability of obtaining outcome r 
given state S. We refer to such a functional as an effect. 
To ensure that measurement behaves according to our 
intuition when applied to mixed states, we require that 
e r (S' m i x ) = pe r {Si) + (1 — p)e r (S2). This means that 
each effect can be taken to be linear |3S]. In order for 
the probabilities of all measurement outcomes to sum to 
one, we also require that 

e r = u , (I) 

r£K a 



where u is the unit effect, which has the property that 
u(S) = 1 for all S G S. We can thus characterize a 
measurement e as a set of outcome/effect pairs 

e = {(r, e r ) | r G lZ e and e r — u} . (2) 

r 

We write e(S) for the probability distribution over out- 
comes when e is performed on a state S. Note that in 
this general framework, not all measurements that are 
mathematically well-defined need be part of a particular 
physical theory. 

One measurement can be equivalent to, or strictly 
more informative than, another. Consider two measure- 
ments e (with outcomes lZ e and effects e r ) and f (with 
outcomes IZf and effects f r ), for which there exists a map 
M :K e ^Kf such that 

e r = f r , Vr'eUf. (3) 

{i- : M(r)=r'} 

If M is one-to-one it corresponds to a re-labelling of the 
outcomes. Otherwise, we say that f is a coarse- graining 
of e (or alternatively that e is a refinement of f ). Because 
we can always re-label the outcomes of an experiment 
according to any map M, we assume that £ is closed 
under re-labelling and coarse-graining. This implies that 
£ always contains the trivial measurement u (with one 
outcome corresponding to effect u). 

A refinement/coarse-graining is trivial if 

e r oc/ M(r ) \/reTZ e . (4) 

In this case, the measurement of e is equivalent to per- 
forming f and obtaining r', then outputting a randomly 
selected r satisfying M(r) = r' (where the distribution 
depends on the proportionality constant in Q). Hence 
the two measurements are equally informative about the 
state. In contrast, when e is a non-trivial refinement of 
f it offers strictly more information about the state, and 
in this case we write e >- f . A subset of measurements 
of particular importance are the fine-grained measure- 
ments £* C £, which have no non-trivial refinements, and 
are therefore optimal for gathering information about the 
state. Formally, 

eeT « Jfe£ :f^e (5) 

We will also call an effect e fine-grained if it is part of 
a fine-grained measurement. We assume that £* is non- 
empty (i.e. that there exists at least one finite outcome 
fine-grained measurement). In quantum and classical 
theory this restricts us to the finite-dimensional case. 

C. Transformations 

As well as preparing states and performing measure- 
ments, it may be possible to perform transformations on 
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a system. As in the case of effects, in order to behave 
reasonably when applied to mixed states, a transforma- 
tion must correspond to a linear map T : Sa ^ Sa' 
taking allowed states to allowed states (although the in- 
put and output systems may be of a different type). For 
each type of system, there will be some set of allowed 
transformations T . 

We assume that the identity transformation / is al- 
lowed, and that the composition of two allowed transfor- 
mations is allowed (as long as the system output by the 
first transformation is of the same type as the input to 
the second). Furthermore, it must be the case that any 
allowed transformation followed by an allowed measure- 
ment is an allowed measurement. 

We can also combine the notion of transformation 
with that of measurement in a natural way to represent 
non-destructive measurements [3] lllj . To incorporate 
non-destructive measurements, define the sub-normalised 
states S = {pS\0 < p < 1, S € <S}. A measurement can 
then be described by assigning a subnormalised transfor- 
mation t r : S — > S' to each outcome r. Result r occurs 
with probability p r — u(t r (s)) and the post measurement 
state is s r = t r (s)/p r . However, we will not need such 
constructions in the main part of this paper. 



D. Relations between states 

Having introduced measurements, we can now define 
what it means for two states to be equal. Given that we 
are taking an operational viewpoint, we adopt the intu- 
itive notion that two states S\, S2 £ S are equal, if and 
only if there exists no measurement that distinguishes 
them. That is, 

VSi, S 2 e S Si = S 2 o- V e e £ : e(5i) = e(S 2 ) (6) 

We can also define a natural measure of distance for 
states Sq, Si € S that directly relates to the probability 
that we can distinguish these states using measurements 
available in our theory, in analogy to the quantum set- 
ting [22]. Suppose we are given either So or Si with equal 
probability, and perform a measurement e to distinguish 
the two cases. Note that the above implies that any the- 
ory that admits at least two possible states has at least 
one measurement e with two possible outcomes. Further- 
more any such theory must have a measurement e with 
exactly two outcomes since any theory admits arbitrary 
coarse- grainings of measurements. We will base our de- 
cision on the maximum likelihood rule, that is, when we 
obtain outcome r, we will conclude we received state So 
if e r (So) > e r (Si) and Si otherwise. The probability 
of distinguishing the two states using measurement e is 
then given by 



tributions e(So) and e(Si). We now define the distance 
as 



C(e(S ),e(Si)) 



(7) 



V(S ,Si) :=supC(e(5 ),e(5i)) 



(8) 



By the above, we see that this measure of distance 
has an appealing operational interpretation because it 
directly captures our ability to distinguish the two states 
So and Si using any available measurement (see appendix 
[A] Lemma A.l for details). In the quantum setting, it 
thus directly reduces to the well-known trace distance. 



E. Multi-partite systems 

Suppose that we have two systems A and B, each of 
which may admit different sets of states and measure- 
ments. We allow that two individual systems can be 
combined into a composite system AB, which we can 
treat as a new type of system having its own set of al- 
lowed states, measurements, and transformations just as 
in the single-system case. However, these sets must bear 
some relation to those of the component subsystems. 

With respect to states, we would like it to be possible 
to independently prepare any state Sa & Sa of system A 
and Sb G Sb of system B. This corresponds to a prod- 
uct state of the composite system, which we denote by 
Sab = Sa'Si Sb G Sab- Note that at this point we have 
not proved that <g) corresponds to a tensor product in the 
usual sense [47], but we would nevertheless expect that 
it is distributive for mixtures and associative. We make 
use of the standard terminology that states are separable 
if they can be written as a mixture of product states, 
and entangled otherwise. To avoid excessive subscripts 
when dealing with multiple systems, we will usually re- 
fer to the state of systems AB and B directly by these 
letters, rather than the more cumbersome Sab and Sa 
(e.g. e(S AB ) = e(AB) etc. ). 

Similarly, we would expect to be able to perform a mea- 
surement e g 8 a and f G £b, giving a product measure- 
ment which we denote by g = e®f € £ ab (with outcome 
set 7Z S — lZ e x IZf and effects gij = ej C3) fj). By consid- 
ering coarse-graining and tri-partite systems, we would 
again expect Cgi to be distributive and associative. When 
applying a product measurement to a product state we 
furthermore require that 



{e i ®f j ){A®B) = e i {A)f j {B) 



(9) 



When considering multiple systems, we can consider 
what happens if we only measure some of these systems. 
Note that this means that we perform a measurement 
consisting of a unit effect on some of these systems. This 
only makes sense if marginal states are well defined and 
we hence assume that even when a bipartite state is en- 
tangled each part is an allowed marginal state. We can 
thus have 



where C(e(S ),e(Si)) = \ Y. r en c M^o) - e r (Si)| is the 
classical statistical distance between the probability dis- 



\/{AB) £S ab ,3AeS a :Vee£ A ,e(A) 



(e® u)(AB). 

(10) 



5 



Furthermore, in the case in which B performs a measure- 
ment on his subsystem and obtains result r (correspond- 
ing to an effect e r ) we would expect A's subsystem to 
'collapse' to an allowed state A\ r € Sa- We will denote 
such a state as 



{I®e r )(AB) 



(11) 



Finally, a crucial constraint on multi-partite systems is 
the existence of product transformations Ta <8> Tb G Tab ■ 
In a variant of quantum theory in which all positive 
(rather than completely positive) trace-preserving maps 
are allowed transformations, this would prevent the exis- 
tence of entangled states. 



III. EXAMPLE THEORIES 

In this section we show how quantum theory and 
classical probability theory fit into the framework de- 
fined above, and also describe the theory known as 'box 
world' which admits all non-signalling correla- 
tions [22l E2] i and was one of the main motivations for 
this work. 



A. Classical probability Theory 



C. Restricted Quantum/classical theories 

Note that unlike other approaches [TJ [3] our frame- 
work also encompasses real Hilbert space quantum me- 
chanics. Furthermore, because we do not assume that all 
well-defined operations are physically realizable, it can 
be used to study quantum or classical theory with a re- 
stricted set of states, measurements and transformations 
(for an interesting example in the classical case consider 
Spekkens' toy model [31]). The entropies we would as- 
sign in such cases would differ from the standard von 
Neumann entropy, and may be interesting to study. 



D. Box world 

In box world, the state of a single system X cor- 
responds to a conditional probability distribution S = 
P(xout|^in) where x- ln and x ou t are elements of a finite 
set of 'inputs' and 'outputs' respectively. The intuition is 
that there is a special set of measurements on each system 
represented by a;; n (referred to as fiducial measurements) , 
and that any probability distribution for these measure- 
ments corresponds to an allowed state. We represent a 
system X with k possible inputs x ln and m possible out- 
puts x out by 



In classical probability theory, a state S corresponds to 
a probability distribution p± over a finite set of elements. 
The effects correspond to linear functionals of the form 



•(*) = £ 



q r Pi 



(12) 



for any q l r £ [0, 1]. Note that the unit effect corresponds 
to q 1 = 1 V i. Normalisation of measurements therefore 
requires ^ r q\ = 1 V i. Transformations correspond to 
stochastic maps. 



B. Quantum Theory 

In quantum theory, the convex set of states are the 
density operators S = p (trace- 1 positive operators), and 
effects correspond to linear functionals of the form 



e r (S) = tr(pE r ) 



(13) 



where E r is a positive operator. All measurements satis- 
fying the normalisation constraint 



E 



5> 



(14) 



are allowed, and the fine-grained measurements are those 
for which all E r are rank 1 operators. The allowed trans- 
formations represent completely positive trace-preserving 
maps fM\ . 



In the special case in which there is only one possible 
input, the conditional probability distribution reduces 
to the standard unconditional probability distribution 
P(%out), and we omit the input line to the box in the 
diagram. Thus box world contains classical probability 
theory as a special case, and we will use such classical 
boxes to represent classical information in our treatment 
of information-theoretic protocols in box world. 

A multi-partite state in box world corresponds 
to a joint conditional probability distribution 



P(*L 



,.N 



l^in^in 1 



) with a separate in- 



put and output for each system. Aside from the usual 
constraints of normalisation and positivity, the allowed 
states must also satisfy the non-signalling conditions: 
That the marginal probability distribution obtained by 
summing over x^ ut , 



(15) 



is independent of sf n for all k. This means that the other 
parties cannot learn anything about a distant party's 
measurement choice from their own measurement results. 
A bipartite state of particular interest is the PR-box 
state [2T]-I29|. for which all inputs and outputs are bi- 
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nary, and the probability distribution is 



A. Entropy 



p nn f T l J* I 1 2 \ _ / 2 1 ^out © ^out — ^in ' x ia 

rpR{x out x out \x in x m ) - | o . otherwise 

(16) 

where © denotes addition modulo 2. This state is 'more 
entangled' than any quantum state, yielding correla- 
tions that achieve the maximum possible value of 4 for 
the Clauser-Horne-Shimony-Holt (CHSH) expression [9], 
compared to < 2%/2 for quantum theory (Tsirelson's 
bound 36), and < 2 for classical probability theory. We 
represent entanglement between systems in box world by 
a zigzag line between them, and classical correlations (i.e. 
separable but non-product states) by a dotted line. 



X W Y 



_1_ 



In box world, we allow all mathematically well-defined 
measurements and transformations to be physically im- 



plemented. Writing x 



) and 



), all effects take the form 



e r (S)= Qr{ ^out |*£in 

)P(x ovtt \x in ), (17) 

where Q r (x on t\Xm) can be taken to be positive [3]. The 
effect eS n corresponding to performing joint fiducial 
measurements x[ n and obtaining results x' out is repre- 
sented by Qs' out (x out \x in ) = &r to a4$z out ,<ut- Because 01 

the positivity of Q r , any effect can be expressed as a 
weighted sum of such fiducial measurement effects. It 
follows that a measurement is fine-grained if and only 
if each of its effects is proportional to some e|™ , and 
that products of fine-grained measurements are them- 
selves fine-grained. 



We now give a concrete definition of entropy for any 
physical theory, which satisfies the above desiderata. 
Other definitions are certainly possible, and we will con- 
sider one alternative (based on mixed state decomposi- 
in Section 



tion 

has many appea 



IV D However, the following definition 



ing properties. 
Given any state S € S, we define its entropy H(S) by 



R(S) :- 



inf H(e(S)) 

est* 



(18) 



where the infimum is taken over all fine-grained mea- 
surements e € £* on the state space S and H(e(S)) 



-(S)loge r (S) is the Shannon entropy of the 
probability distribution e(S) over possible outcomes of 
e. This has an intuitive operational meaning as the min- 
imal output uncertainty of any fine-grained measurement 
on the system. Note that for information-gathering pur- 
poses, the best measurements are always fine-grained, 
and without restricting to this subset the unit measure- 
ment would always be optimal (giving zero outcome un- 
certainty). Furthermore note that trivial refinements of 
e always generate a higher output entropy, so it is suffi- 
cient to only consider measurements in the infimum that 
have no parallel effects. 

In appendix [b] we prove that H retains several impor- 
tant properties of the Shannon and von Neumann en- 
tropy. In particular, we show: 

1. (Reduction) H reduces to the Shannon entropy for 
classical probability theory, and the von Neumann 
entropy for quantum theory. 

2. (Positivity and boundedness) Suppose that the min- 
imal number of outcomes for a fine-grained mea- 
surement in £g is d. Then for all states S G S, 



log(d) > H(S) > 0. 



(19) 



IV. GENERALIZED ENTROPIES 

The Shannon entropy H(p) = — J2iPi^°&Pi an d von 
Neumann entropy S(p) = — tr(plogp) are extremely use- 
ful tools for analyzing information processing in a clas- 
sical or quantum world. Here, we would like to define 
an analogous entropy for general probabilistic theories 
which reduces to H(p) and S(p) for classical probability 
theory and quantum theory respectively. We would also 
like our new entropy to retain as many of the mathemati- 
cal properties of the Shannon and von Neumann entropy 
as possible. Not only will this help our new entropy con- 
form to our intuitive notions, but it will make it easier to 
prove general results using these quantities, and transfer 
known results to the general case. Note that although 
we can use any base for the logarithm in the definition of 
the Shannon and von Neumann entropies (as long as we 
are consistent), in what follows we will use base 2 (i.e. 
log = log 2 ) throughout. 



3. (Concavity) For any Si, £2 S S and any mixed 
state S mix = pSi + (1 — p)S-2 € S: 



H(S mix ) > pH(Si) + (1 - p)H(S 2 ) 



(20) 



4. (Limited Subadditivity) Consider a theory with 
the additional property that fine-grained measure- 
ments remain fine-grained for composite systems. 



e&£%f<E£* B ^B®fe£* AB . (21) 
This is true in quantum theory, classical theory, and 



box world. When (21 1 holds, then for any bipartite 
state AB £ Sab and reduced states A € Sa and 
BeS B 



R(A) +H(J3) > H(AB) 



(22) 
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5. (Limited Continuity). Consider a system for which 
all allowed measurements have at most D out- 
comes, or for which restricting the allowed mea- 
surements to have at most D outcomes does not 
change the entropy of any state. This is true in 
quantum theory, with D = d = dim('H), and also in 
box world and classical theory. Then we can prove 
an analogue of the Fannes inequality [T31 Q2] , which 
says that the entropy of two states which are close 
does not differ by too much. In particular, given 
Si, S 2 eS satisfying V(S ll S 2 ) < 1/e, 



|H(Si)-H(S 2 )| <£>(Si,S 2 )log 



D 



£>(Si,S 2 ) 



(23) 



We will also see in section 



VIII 



that H has an appealing 



operational interpretation as a measure of compressibility 
for some theories. 

However, one property of the von Neumann entropy 
that does not carry over to H is strong subadditivity [21] . 
In particular, we will see in section [V] there exists a tri- 
partite state in box world such that 



H(ABC) + H(C) > R(AC) + H(BC) 



(24) 



In analogy to the quantum case, we can also define the 
mutual information via 

I(A; B) := R(A) + H(-B) - H(AB). (26) 
= U{A) - R(A\B) = H(B) - U(B\A) 

This quantity will be positive whenever subadditivity 
holds, and reduces to the usual mutual information in 
the quantum and classical case. Similarly, we may de- 
fine a notion of accessible information analogous to the 
quantum setting as 

l acc (A;B) := sup l(e(A); f(B)) , (27) 

ee£ A S££ B 

where I is the classical mutual information. 



2. An alternative definition 

Given the problems observed with the previous defi- 
nition in some theories, we now define a second form of 
conditional entropy based on H, which sometimes cap- 
tures our intuitive notions about information in a nicer 
way. For any bipartite state AB £ Sab with reduced 
states A 6 Sa and B £ Sb we define 



B. Conditional entropy and mutual information 



R+(A\B) := inf £ f, (B)K(A U ) 

rtC" — 



(28) 



1. A standard definition 

Based on the entropy H, we can also define a notion 
of conditional entropy. In analogy to the von Neumann 
entropy [8] , we define the conditional entropy of a general 
bipartite state AB G Sab with reduced states A G Sa 
and B e Sb by 



R(A\B) := H(AB) - H(B) 



(25) 



This has the nice property that for quantum or classical 
systems it reduces to the conditional von Neumann and 
Shannon entropies respectively. In some theories (includ- 
ing quantum theory but not classical probability theory) , 
H(A|_B) can be negative, which is strange, but opens the 
way for an appealing operational interpretation as in the 
quantum setting [20] . 

However, unlike in quantum theory, we will see that 
H(-|-) has the counterintuitive property that it can de- 
crease when 'forgetting' information in some probabilis- 
tic theories. In particular, the violation of strong subad- 
ditivity for H in box world implies that it is possible to 
obtain R(A\BC) > H(A|_B), and that H(-|-) is not sub- 
additive. These properties will motivate us to consider 
an alternative definition of the conditional entropy be- 
low. However, we will show that no 'reasonable' entropy 
in box world can have all the appealing properties of the 
conditional von Neumann entropy. 



where the infimum is taken over all measurements on B, 
and Atj is the reduced state of the first system condi- 
tioned on obtaining measurement outcome j when per- 
forming f on the second system. This definition has the 
appealing property that conditioning on more systems 
always reduces the entropy, that is, H(A ) > H + (^4|B) > 
Il+(A\BC) (see appendixO Lemma C.l), and it reduces 
to the conditional Shannon entropy in the classical case. 
Note, however, that H+(-|-) does not reduce to the con- 
ditional von Neumann entropy in the quantum setting, 
as it is always positive. Furthermore, we will see in sec- 
tion [VI] that it is not subadditive, and does not obey 
the usual chain rule, (even though a limited form of 
chain r ule h olds in box world as we show in the appendix 
Section C2|. Nevertheless H+(-|-) seems quite a natural 



cntropic quantity, and its corresponding quantum ver- 
sion has found an interesting application in the study of 
quantum correlations |12j . 

We can also define a corresponding information quan- 
tity via 



I+(A;B)=R(A)-R+(A\B) 



(29) 



which is always positive. However, unlike I(A;£?), this 
definition is not symmetric and hence it cannot really 
be considered 'mutual information'. Instead, l + (A;B) 
captures the amount of information that B holds about 
A. 



C. Other entropic quantities 



D. Decomposition entropy 



For cryptographic purposes, such as in the setting of 
device independent security for quantum key distribu- 
tion, it is useful to define the following Renyi entropic 
variants of H. More precisely, we define 



H Q (5) 



inf H tt (e(5)) , 

est* 



(30) 



where H a (e(5)) = r=s log (£^(5)^) is the Renyi 

entropy of order a. Note that Hi (5) = H(5) (taking the 
limit of a — > 1). These quantities can also be useful in 
order to bound the value of H(-) itself as for any state 
S £ S and a < /3 we have %(S) > H a (5). 

To define a notion of relative entropy, we adopt a 
purely operational viewpoint. Suppose we are given N 
copies of a state 5i or a state 52, and let 

qN q®N 

O i . — O i 



°2 ' — D 2 

Classically, as well as quantumly, the relative entropy 
captures our ability to distinguish 5f from 5^ for large 
N. Note that to distinguish the two cases, it is suffi- 
cient to coarse grain any measurement to a two outcome 
measurement e = {(l,ei),(2,e 2 )}, where without loss of 
generality we associate the outcome '1' with the state 5f 
and '2' with ■ Then e^S^) denotes the probability 
that we conclude that the state was S 2 , when really we 
were given S± . Similarly, e2(5f ) denotes the probability 
that we falsely conclude that the state was ■ In what 
is called asymmetric hypothesis testing, we wish to min- 
imize the error ei^S^) while simultaneously demanding 
that 62(5^) is bounded from above by a parameter e. 
Here we fix e = 1/2. We therefore want to determine 

PN := inf{e 1 (5 2 v )|e 2 (5f ) < 1/2} (31) 

e 

In a quantum setting, it has been shown that the quan- 
tum relative entropy is directly related to this quantity 
via the quantum Stein's lemma [HQ23I5S], which states 
that we have 



D(S 1 \\S 2 )= lim 

N— ¥OC 



logJJjv 

N 



(32) 



This is a deep result giving a clear operational interpre- 
tation to the relative entropy, telling us that in the large 
N limit the probability of making the error pjy decreases 
exponentially with D(5i||5 2 ). Furthermore, as it is ex- 



pressed in operational terms, we can simply adopt (32 1 as 



our definition of relative entropy in any theory for which 
the limit is well defined. Thus we recover the usual value 
in the quantum (and classical) case, and in all other the- 
ories we still capture the same operational interpretation. 

Note also that our choice of e = 1/2 was quite arbi- 
trary, and one may consider a family of relative entropies, 
one for each choice of e. In quantum theory, these are 
all equivalent but they may yield different values in 
other theories. 



Although the entropy H has several appealing proper- 
ties, and seems quite intuitive, it is nevertheless interest- 
ing to consider alternative notions of entropy for general 
theories. One seemingly natural alternative is the de- 
composition entropy, which measures the mixedness of a 
state. 

There is a special subset of states 5*C5 which cannot 
be obtained by mixing other states: 

SeS* & $S l7 S 2 £ S, P £ (0,1) : 5 = P S l + (I -p)S 2 . 

(33) 

S* form the extreme points of S and are referred to 
as pure states (with the remaining states being mixed). 
Suppose that any state in S can be decomposed into a 
finite sum of pure states. Then we can define the en- 
tropy of a state by the minimal Shannon entropy of its 
decompositions into pure states. Define a decomposition 
D(5) of a state 5 £ S as a probability distribution over 
the set of pure states that is non-zero for only a finite set 
of states S{ £ S* with probabilities p, £ (0, 1] such that 
J2Pi$i — S- Then define the decomposition entropy as 



H(5) 



inf H(D(5)). 

D(S) 



(34) 



Like our previous entropy definition, we show in ap- 
pendix [d that H reduces to the Shannon and von Neu- 
mann entropy in classical probability theory and quan- 
tum theory respectively. However, it has a number of 
unappealing properties when compared with H. In par- 
ticular it is neither concave nor subadditive, as revealed 
by explicit counterexamples from box world given in ap- 
pendix [d] 

After studying simple examples in box world, it seems 
that fl is a less intuitive and helpful measure of uncer- 
tainty than H. For this reason, although H may play an 
important role in discussions of entanglement or purity 
in many generalized theories, and may also lead to inter- 
esting operational interpretations, we do not discuss it 
further here. 



V. EXAMPLES IN BOX WORLD 

We now investigate how our entropic quantity H(-) be- 
haves in box world with a simple, yet illustrative, exam- 
ple. 

To first gain some intuition on how H behaves in such 
a setting, consider a trivial classical system X which ad- 
mits only one possible measurement and outputs 2 pos- 
sible values x ou t £ {0, 1} each which probability 1/2. 



Clearly, since the system admits only one possible mea- 
surement e, we have 

H(X) = H(epO) = H((l/2, 1/2)) - 1 . (35) 

Consider now a PR-box (a bipartite system in the state 



(161) 



_i_ 



T" 



Y W Z 



where Alice holds system Y (with binary input y- ln and 
output j/out) and Bob holds system Z (with binary input 
z in and output z ou t)- Note that the fine-grained mea- 
surements on the entire system correspond to a sequence 
of fiducial measurements on the two subsystems (where 
the choice of input to the second subsystem may depend 
on the output of the first) [3], and the outcome is the 
output of both measurements. The minimal entropy for 
the joint system can be obtained by inputting '0' into 
both boxes, giving outputs '00' or '11' each with prob- 
ability 1/2 (in fact, any other fine-grained measurement 
is equally good) , and the marginal states yield a random 
output bit for any input. Hence we have that 



H(F) = R(Z) = R(YZ) = 1. 



(36) 



We now consider a scenario for which it is known that 
PR-boxes yield an advantage over the quantum setting 
in terms of information processing. The basis of our ex- 
ample is a simple non-local game in which Alice is given 
a random 'parity' bit x, and has to output two bits xq 
and Xi satisfying xq © X\ — x (where © denotes addition 
modulo 2). Then, without receiving any communication 
from Alice, Bob is given a random target bit t and has 
to successfully output Xt [13] . This game is equivalent to 
the CHSH-game 0E5]. 

We begin with Alice having the parity bit (which we 
model by a classical box in the state X described above), 
and Alice and Bob sharing a PR-box in the state Y Z. 



Now Alice performs the following procedure, which cor- 
responds to an allowed transformation in box world. She 
measures the parity bit X to obtain x :— x outl then 
uses this as the input to her part of the PR-box, setting 
y- m = x and obtaining outcome y on t- Finally, she pre- 
pares two new classical bits xq = y out and x\ = x © y out 
(represented by classical boxes X Q ,Xi). Note that be- 
cause of the correlations inherent in the PR box, the 
output of Bob's system will now be described by z out = 
yin ■ z- m © J/out = (xq@xi) • z in © x — x Ziii . Hence the 
state of X XiZ after this procedure is the classically cor- 
related state: 



00 



X 7 



1 . r 

4 ■ -^out 

: otherwise 



(37) 



Given any target bit t, Bob can win the game by setting 
Zj n = t and outputting the result z out — x t . We can think 
of Bob's system as a perfect random access encoding of 
the two-bit string x Xi [331 13"8] . 

Consider the entropies of the state XoXiZ. All of the 
individual systems yield a random output bit, giving 

H(X ) = R(X 1 ) = R(Z) = 1 , (38) 

and Xq and X\ are independent random bits, so 

H(A Ai) = 2 . (39) 

Also note that we have 

RiXoX^) = 2 , (40) 

since for any input z- m , the output z out will be perfectly 
correlated with one of the other bits (giving only 2 in- 
dependent random output bits) . Finally, because we can 
make z out perfectly correlated with either of the remain- 
ing bits we have 



R(X Q Z) = R(X 1 Z) = 1, 



(41) 



where the optimal measurements are z- m = and z; n = 1 
respectively. 

These entropy values all seem very intuitive (Note in 
contrast that for the decomposition entropy R{XqZ) = 
2). However, they violate several natural properties of 
the Shannon and von Neumann entropies. 

(a) Strong subadditivity. First of all, it is easy to see 
from the above that 

H(A A!Z) + R(Z) > R(X Z) + H(AxZ), (42) 

which violates strong subadditivity. We now turn to the 
two possible forms of conditional entropy that we de- 
fined, where our simple example clearly illustrates their 
differences. 



A. Standard conditional entropy 

First of all, we consider the standard form of condi- 
tional entropy, which reduces to the von Neumann en- 
tropy in the quantum settings. By the above, we can im- 
mediately see that it has the following interesting prop- 
erties. 
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(b ) Subadditivity of the conditional entropy. Using ( 25 ) 
we deduce that 

R{X \Z) = R{Xi\Z) = 0, %.{XzX x \Z) = 1 (43) 

which seems intuitive, as we can perfectly predict the out- 
put of either Xq or X\ (but not both) using Z. However, 
this yields a violation of subadditivity for the conditional 
entropy, as 



as before, hence this new measure still violates subaddi- 
tivity. However we now have 



R(X Q X X \Z) >B.(X \Z)+B.{Xi.\Z). 



(44) 



This may seem rather bizarre at first glance, however, 
we will see in Section Wl\ that no 'reasonable' measure of 
conditional entropy in box world is subadditive, unlike 
the von Neumann entropy. 

It is also interesting to consider the corresponding mu- 
tual information quantities, which are 

I(A ; Z) =I(X i; Z) =T(X X i; Z) = 1. (45) 

Again, these seem intuitive, as we can extract one bit 
of information about either Xq or X\ or the pair XqX\ 
from Z . 

It may be tempting to conclude that the point at 
which H(AoAi|iv) becomes subadditive (or equivalently, 
where H(AoAi.Z') becomes strongly subadditive) is ex- 
actly when the PR-box is weakened to obey Tsirelson's 
bound. Note that our trivial example only shows that 
PR-boxes which are more than 0.89 > 1/2 + l/(2\/2) 
correct do not obey subadditivity. However, note that 
constraining non-local boxes to obey Tsirelson's bound 
alone is insufficient to reduce box world to quantum the- 
ory (e.g. each quantum system admits a continuum of 
fine-grained measurements whereas any box admits only 
a finite set). 

(c) Conditioning can increase entropy. Our small ex- 
ample also emphasizes another curious property of the 
conditional entropy. By definition, 



B.(X \XtZ) = B.{X a XxZ) - H(XiZ) = 1. 



(46) 



But this is strange, because we can perfectly determine 
the output of X given Z. Furthermore, since H(Xq\Z) = 
0, we then clearly have 



H(X |X X Z) > R(X \Z) , 



(47) 



which means that 'forgetting information', namely dis- 
carding Xi , can decrease uncertainty. Again, it may seem 
that this is a consequence of not choosing the 'correct' 
definition of entropy. 



B. Alternative conditional entropy 

Reevaluating the conditional entropies of the previous 
section using this new definition we find that 

R+(X a \Z) = H+(Xi|Z) = 0, R+iXoX^Z) = 1 (48) 



R + (X \ZX 1 ) = , 



(49) 



as we would intuitively expect. This means that condi- 
tioning on X\ no longer increases the entropy. However, 
it generates a violation of the chain rule 

R + {X Q X X \Z) + H+(A 1 |Z) + H+pColXiZ). (50) 

On balance though, this measure of conditional entropy 
seems more reasonable than the original one in this ex- 
ample. 



VI. PROPERTIES OF CONDITIONAL 
ENTROPIES IN BOX WORLD 

We now show that any 'reasonable' measure of the 
conditional entropy in box world will necessarily defy our 
intuition about information in several ways. 

Intuitively, the goal of any entropic quantity is to cap- 
ture the degree of uncertainty we have about a system, 
possibly given access to some additional information. We 
assign a label A to the system of interest and use B to 
denote any additional systems or information available 
to us. For simplicity, let us suppose that A corresponds 
to some classical information (i.e. it is a state of a clas- 
sical box). Let H(A|-B) denote some entropic quantity 
that quantifies our uncertainty about A given B. If we 
were able to determine A with certainty given access to 
B (i.e. to determine the precise output of the classical 
box A), we would intuitively say that there is no un- 
certainty and the quantity H(A|£?) should vanish. Con- 
versely, if we cannot determine A given B, but will neces- 
sarily have some residual uncertainty, then the quantity 
H(yl|_B) should be positive. Motivated by this intuition 
in quantifying uncertainty we demand the following two 
properties to hold for any 'reasonable' measure of uncer- 
tainty when A is classical. 

{1} If the output of A can be obtained from B with 
certainty, R(A\B) = 0. 

{2} If the output of A cannot be obtained from B with 
certainty, then R(A\B) > 0. 

In the classical and quantum world, all commonly used 
entropic quantities satisfy these conditions (given that A 
is classical). In both such worlds, there also exist en- 
tropic quantities that are subadditive and obey a chain 
rule, for example the conditional Shannon and von Neu- 
mann entropies. In box world, H+(.A|.B) is 'reasonable' 
according to this definition, while H(^4|i?) is 'unreason- 
able'. Curiously, it turns out that in box world there 
cannot be any reasonable measure of conditional entropy 
that obeys conditions {1} and {2}, but at the same time 
is subadditive or obeys a chain rule. 
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(a) Subadditivity of the conditional entropy. Consider 
the state of the two classical bits A = XqXi and Bob's 



binary input/output box B — Z described by (37) in 
the previous section. We now show that in this case no 
reasonable measure of entropy that obeys properties {1} 
and {2} is subadditive. First of all, note that Bob can 
determine one of the bits perfectly, given access to Z . 
Therefore from condition {1}, we obtain that 



H(X |Z)=H(X 1 |Z)=0 



(51) 



However, since Bob cannot determine the parity of the 
two bits, he certainly cannot learn both bits perfectly 
and hence from condition {2} we have 

Hpf o Xi|Z)>0. (52) 

In order for subadditivity to hold, we would need that 

Hpr *i|S) < H(X |Z) +H(Xi|Z) , (53) 



which using (51) and (52) leads to a contradiction 



Note that subadditivity could still hold, if the quantity 
H(XqXi\Z) were negative. 

(b) Chain rule for the conditional entropy. We now 
show that a chain rule is impossible in box world for any 
entropic quantity that satisfies {1} and {2}. In fact, for 
the purposes of this proof it is sufficient to replace {2} 
by the weaker assumption 

{2'} If the output of A cannot be obtained from B with 
certainty, then B.(A\B) ^ 0. 



Note that for the state described by (37), condition {1} 
gives us 



R(X \Z,X 1 ) = R(X Q \Z) = 



(54) 



because x$ can be obtained perfectly from B = Z or 
B = ZX\ . A chain rule for the conditional entropy would 
mean that 



H(XoX!|Z) = R(X 1 \Z)+R(X \Z,X 1 ) . 



(55) 



Using Eq. (54), together with Eqs. (15 11) and (52) again 



gives us a contradiction. Note that H+(-|-) obeys condi- 
tions {1} and {2}, and hence does not admit a chain rule 
in box world. 

As H(-|-) satisfies a chain rule, it follows from the above 
that it must be 'unreasonable'. Indeed, this can be seen 
from the fact that H(Xq\XiZ) = 1 despite the fact that 
we can perfectly determine the output of X given Z 
and Xi, violating condition {1}. It is easy to see that 
if we were to drop the conditions that make an entropy 
'reasonable' but simply assume that it is not subadditive, 
but we do enforce a chain rule, then conditioning can 
increase entropy. 



VII. INFORMATION CAUSALITY 

We now use our entropic quantities to investigate the 
game given in [26J. This task relates to 'information 
causality', which is expressed as the principle that 'com- 
munication of k classical bits causes information gain of 
at most k bits'. In |26] it is reported that this principle 
can be violated in box world using the following simple 
game (where we take k — 1): Alice is given two random 
classical bits oq and a\ and Bob is given a single random 
bit t. Alice is allowed to send a single bit message m 
to Bob, after which he must output a bit b. The couple 
succeed in the task if b = a t . 

This task is clearly very similar to the non-local game 
considered in section |VJ Indeed, any solution to the pre- 
vious problem can also be used to solve this one. Alice 
takes the parity bit as x = do © a\ , then generates xq and 
x\ = xg(Bx as before. She sends the message m = ccoffiao 
to Bob. Using the previous protocol, Bob generates x t , 
and then outputs b = x t ffi m = a t . 

In the context of this game, 'information causality' is 
interpreted as meaning that 



I := I(o ; b\t = 0) + I(ai; fojt = 1) < 1. 



(56) 



where I(s"|") is the classical conditional mutual infor- 
mation. This inequality is obeyed in quantum theory. 
However, given the above argument it is clear that it can 
be violated in box world, as Alice and Bob can achieve 
I = 2. 



Let us examine why ( 56 1 fails in terms of our general 
entropies. We consider the state just after Bob has re- 
ceived the message from Alice, when she holds classical 
bits Ao and A\, and Bob holds the classical message M 
and his part of the PR-box Z. This state is described by 



P(a aimz out \z u 



a. 



I • y 
g ■ ~out 

: otherwise 



© m 



(57) 



We can compute entropies explicitly in this case as in 
section |VJ and will obtain similar results. However, [26] 
also contains a proof of ( 56 1 in quantum theory based 



on the quantum mutual information. It is interesting to 
attempt to follow this proof using our general mutual 
information I (or I + ) to see where it fails. 

The quantum proof relies on the chain rule for 
quantum mutual information (which I satisfies by 
definition) [48] , positivity of the mutual information 
(which is true for I in box world due to the subadditivity 
of H), and non-signalling (which is one of the defining 
features of box world). However, the crucial step is a use 
of the data processing inequality to deduce that 



I(A ;AiMZ) > l(A ;MZ) 



(58) 



Although it is very natural that 'forgetting' Ai can only 
decrease the mutual information, this inequality is vio- 



lated in box world. Indeed, for the state (57) we find 



I(A ; A\MZ) = 0, I(A ; MZ) = 1 



(59) 
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This is again a consequence of the violation of strong 
subadditivity for H, which forms the key ingredient in 
why ( 56 ) can be violated in box world. 



Although the violation of (56) in box world, and its 



validity in quantum theory, is a very interesting result, it 
is interesting to consider whether this really implies that 
communicating k bits has caused an information gain of 
more than k bits. From the state (57) it is easy to check 
that 

T( A A X ; MZ) = T+ ( A A l ; MZ) = 1 < 1 (60) 

hence under both these measures the total information 
about the composite system AqAi has only increased by 
one bit due to the one bit classical message. We show 



in Section C 2 in the appendix that in box world we in- 
deed have that given some arbitrary system Z held by 
Bob, the mutual information about a classical string A 
can never increase by more than the length of a classical 
message M that is transmitted. Furthermore, Bob can 
extract only one of the two bits, either Aq or A\, with 
the help of the message as is indeed noted in It is 
therefore arguable that the information gain of Bob is 
only one bit. Perhaps 'information causality' should be 
restated in a clearer way, that more directly represents 
the form of (56). e.g. the principle that an m bit clas- 



sical communication allows us to learn any one out of at 
most m unknown bits. 



VIII. A SIMPLE CODING THEOREM 

We now show that for some theories, the entropic quan- 
tity H(-) has an appealing operational interpretation in 
capturing our ability to compress information. Here, we 
will only show this for theories obeying further restric- 
tions, and it is an interesting open question how generally 
this interpretation applies. 



A. Dimension and subspaces 

Before we can talk about compression, we first need to 
clarify our notions of the size of a system. Intuitively, the 
size of a system should limit the amount of uncertainty 
we can have about it. Furthermore, to compress, we 
will clearly need to shrink the original state space. It is 
therefore helpful to define a notion of size for any subset 
of allowed states St C S. 

We refer to the size of a set of states St as its dimension 
d, which we define by 



d := min \{r € K B \3S G S T ,e r (S) > 0}|. 



(61) 



This corresponds to eliminating all measurement out- 
comes that cannot occur for any state in St, and then 
counting the minimal number of remaining outcomes for 
any fine-grained measurement. It follows that logd > 



H(S') for all S € St- In quantum theory d corresponds 
precisely to the dimension of a Hilbert space. 

A natural way to select a subset of states is to consider 
all states that yield a given measurement outcome with 
certainty. We refer to an effect / such that {/, u — /} is 
an allowed measurement, and that occurs with certainty 
for some state, as a full effect (i.e. / is full if there 
exists S E S such that f(S) = 1). For any full effect 
/, we can therefore define a non-empty subset of states 
S f = {S\S £ SJ(S) = 1}. We refer to such a subset 
as the subspace of S given by /. Note that subspaces 
are always convex, and the subspace corresponding to an 
effect / which is both full and fine-grained obeys df = 1 . 

We say that we have compressed a state if we have 
constrained it to lie within a set of states of smaller di- 
mension. 



B. Additional assumptions 

So far, we were never concerned about what happens 
to a state after a measurement. In our compression pro- 
tocol, however, we will need to use an abstract notion 
of post-measurement states as described in Section |II C| 
In particular, we will consider pseudo-projective measure- 
ments, which we define to be measurements that fullfill 
two conditions. 

1. (Repeatability) A pseudo-projective measurement is 
repeatable, such that if the same measurement is 
applied again the same result is obtained. This 
requires that the output state S r after obtaining 
a result r lies in the subspace given by e r (i.e., 
e r (S r ) = 1). Consequently, all effects in a pseudo- 
projective measurement must be full effects. 

2. (Weak Disturbance) If a particular outcome r of a 
pseudo-projective measurement occurs with proba- 
bility e r (S) > 1 — S for a state S, then the post mea- 
surement state S r after this result is obtained satis- 
fies e r (S)V(S, S r ) < cS E : where c > and e & (0, 1] 
are constants depending on the particular theory. 
For example, for projective measurements in quan- 
tum theory c = (\/8 + l)/2 and e = 1/2. 

Any projective measurement in quantum theory fulfills 
these conditions, but these conditions alone do not de- 
fine projective measurements, hence the slightly different 
name. In quantum theory, the weak disturbance property 
can be understood as an instance of the gentle measure- 
ment lemma [43] . 

Furthermore, in order to prove our simple coding the- 
orem, we will need to make some additional assumptions 
on the states and the measurements that achieve the 
minimal output entropy H(-) in our theory. In partic- 
ular, we assume that for all states, the minimal output 
entropy can be attained by a pseudo-projective measure- 
ment. That is, we assume that for all S € S there ex- 
ists some pseudo-projective measurement e G £* such 
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that H(S) = H{e{S)). We further assume that for 
all such measurements, e® n is fine-grained and pseudo- 
projective, and that course grainings of e® n can also be 
made pseudo-projective. Lastly, we assume that the di- 
mension of S® n is d n . These assumptions are all true in 
the classical and quantum case (where e is projective). 

We will see in Appendix [E] that this is all we will need 
to show the following simple coding theorem following 
the steps taken by Shannon [3T] and Schumacher [317] 
(see for example [24]). 

C. Compression 

We consider a source that emits a state G S with 
probability q^, chosen independently at random in each 
time step. When considering n time steps, we hence ob- 
tain a sequence of states S% = , ■ ■ ■ , Sk„ G S® n with 

k = [k\, . . . , k n ), where each sequence occurs with prob- 
ability qj: = ILjgfc.. A compression scheme consists of an 
encoding and decoding procedure. The encoding proce- 
dure maps each possible S% into a state S% G Sf C S® n . 
In turn the decoding procedure maps the states back 
to states 5g G S on the original state space. In anal- 
ogy with the quantum case, we say that the compression 
scheme has rate R, if the dimension of the smaller space 
obeys df < 2 nR . Note that in order for a compression 
scheme to be useful, it must have R < \ogd (and hence 
df < d n ). A compression scheme is called reliable, if we 
can recover the original state (almost) perfectly, in the 
sense that the average distance between the original and 
the reconstructed state can be made arbitrarily small for 
sufficiently large n. I.e. for any e > and all sufficiently 
large n, 

£?sZ>0Ws)<e- ( 62 ) 

k 

Note that the output of the source can be described 
as a mixed state Src = qtSk in each time step, and 
a product state Src®" G S® n over the course of n time 
steps. We then obtain the following theorem (see ap- 
pendix Section [E| in terms of the entropy of the source 
H(Src). 

Theorem VIII. 1. Consider an i.i.d source {qk,Sk G 
S}k with entropy rate H(Src). Then for R > H(Src) there 
exists a reliable compression scheme with rate R. 

Note that in order to establish that H(-) truly char- 
acterizes our ability to compress information, we would 
also like to have a converse stating that for R < H(Src) 
there exists no reliable compression scheme. In quan- 
tum theory, it is not hard to prove the converse of the 
above theorem since it admits a strong duality between 
states and measurements, which may also hold for other 
theories. Here, however, we explicitly tried to avoid in- 
troducing any such strong assumptions. 



IX. CONCLUSION AND OPEN QUESTIONS 

We introduced entropic measures to quantify informa- 
tion in any physical theory that admits minimal notions 
of systems, states and measurements. Even though these 
measures necessarily have some limitations, we neverthe- 
less showed that they also exhibit many intuitive proper- 
ties, and for some theories have an appealing operational 
interpretation, quantifying our ability to compress states. 
Most of the problems we encountered with the condi- 
tional entropy seem to arise due to a violation of strong 
subadditivity. It is an interesting question whether quan- 
tum and classical theories are the only ones in which H 
is strongly subadditive, or whether this is true for other 
theories. Indeed, it would be an exciting question to turn 
things around and start by demanding that our entropic 
measures do satisfies these properties, and determine how 
this restricts the set of possible theories. 

In H_|_ (■ | ■ ) we defined a natural entropic quantity which 
differs from the conditional von Neumann entropy in 
quantum theory, and has been used in [T^] to study 
quantum correlations. It would be interesting to study 
whether this quantity can shed any further light on quan- 
tum phenomena, or if an alternative conditional entropy 
can be defined that behaves like H + (-|-) in box world, 
but still reduces to the conditional shannon entropy in 
quantum theory. 

Whereas we have proved some intuitive properties of 
our quantities, it is interesting to see whether other prop- 
erties of the von Neumann or Shannon entropy carry over 
to this setting. In particular, it would be interesting to 
prove bounds on the mutual and accessible information 
analogous to Holevo's theorem when none of the systems 
are classical. 

Another interesting question is whether one can find a 
closed form expression for the relative entropy in general 
theories. In quantum theory, we can define the mutual in- 
formation (and indeed the entropy itself) in terms of the 
relative entropy [49] . hence such an approach may also 
yield an alternative definition of other entropic quantities 
for general theories. 

We believe our measures are an interesting step to- 
wards understanding information processing in general 
physical theories, which may in turn shed some light on 
our own quantum world. 
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Appendix A: Distance metric 



[49] 



We now show that the quantity 
on the state space S. 

Lemma A.l. V : S x S - 

metric on the state space S 



is indeed a metric 
0,1] as defined in ([8]) is a 
Proof. Consider states Sq,Si,S2 € S. Clearly, 



V(So,Si) >0 



(Al) 



using the property of the classical statistical distance, 
where equality holds iff So = Si by definition of the 
state space S. It remains to show that D obeys a tri- 
angle inequality. Let be the optimal measurement to 
distinguish states i and j. We then have 



V(S Ql S 1 )+V(S 1 ,S 2 ) 

>C(e 02 (^ ),e 02 (5 1 )) 
>C(e 02 (S ),e 02 (S 2 )) 



(A2) 

C(e 02 (Si),e 02 (S 2 )) 
-V(S ,S 2 ) , 



where the second inequality follows from the fact that the 
classical statistical distance C itself obeys the triangle 
inequality. □ 



Appendix B: Properties of H 

In this appendix we derive properties of the entropy 
H used in the paper. Note that by assumption £* is 
non-empty, which implies that H(S*) is well-defined. 



Reduction to the von Neumann and Shannon 
entropy 



We now show that the entropic quantity ( 18 ) reduces 



to the von Neumann and Shannon entropy in the classical 
and quantum settings respectively. For the relation to the 
von Neumann entropy, we will need the following little 
lemma. 

Lemma B.l. Let p € B(H) be a quantum state with 
eigendecomposition p = ^ . pj\ipj)(ipj \ . Then 



H(p) = S(p) = H(p) , 
where p= (pi, . . . ,pd) with d — dim('H). 



(Bl) 



Proof. Our goal will be to show that for any fine-grained 
measurement e with 



ei = ci\tl>i){<f>t\€B(n) | 0<c £ <l 
and y^Cf|</>i)(<fol = I 





(B2) 



the Shannon entropy of the distribution qg := ce((j>e\p\4>e) 
is always at least as large as the distribution obtained by 
measuring in the eigenbasis of p, that is, 



H(p) < H(g) 



(B3) 



with q = (q 1 , . . .,q N ). 

Let N = |e| and note that d < N. First of all, note 
that we can always extend a distribution {pj} over d 
elements to a distribution {pj} over N elements by letting 
Pj = pj for all j < d and pj = for all j > d. Clearly, 
H(p) = H(p) with p = (fix, . . . ,p N ). 

Second, note that 



E 



Pi<U\i 



and 



qe\j = ci\{(j> t \il)j) 



(B4) 



from which we immediately obtain together with ( B2 ) 
that 



and ^2 = 1 



(B5) 



16 



Consider the N x N matrix M determined by the entries 

(B6) 



, r , for j < d 

Mt,j = <, i-ct 



N-d 



for j > d . 



which allows us to write q — Mp. Note that since M^j > 
and Mgj = J^i Me.j = 1, M is a doubly stochastic 
matrix. Using Birkhoff 's theorem (see e.g ., [T9], Theorem 
8.7.1]), we may thus write M as a convex combination of 
permutation matrices, that is, 



M = y, p (*)* . 

ttGSjv 



(B7) 



where P is a probability distribution over the group of 
permutations Sjy. Using the concavity of the Shannon 
entropy we obtain 



H(q) > P(*W*(P» = H(p) 



(B8) 



As we can always measure p in its eigenbasis it follows 
that 



HO) = inf H(e(5)) = inf H(q) = H(p). (B9) 



and it is easy to see that H(p) = S(p). 



□ 



Since the von Neumann entropy reduces to the Shan- 
non entropy in a classical setting, this also shows that the 
entropic quantity (181 reduces to the Shannon entropy in 
the classical case. 



2. Positivity, Boundedness, and Concavity 



which concludes our claim. On the other hand, if the 
infimum is not achievable then for all sufficiently small 
<5 > we can find an e £ £* such that H(S' mix ) = 
H(e(S' m i X )) — 5. Using the same argument as before, we 
find 

H(5 mix ) > pH(S x ) + (1 - p)R(S 2 ) - 6 (B12) 
As this holds for all sufficiently small 5 the result follows. 

3. Limited Subadditivity and Continuity 

Here we prove two properties of H that require addi- 
tional minor assumptions on our theory. However, they 
are obeyed in quantum theory, classical theory and box 
world. 

Limited Subadditivity: Given an additional reasonable 
assumption, we can prove that H is subadditive, we first 
assume that there exist e £ £* A and f £ £* B such that 
H(A) = H(e(A)) and H(S) = H(f(B)). By assumption, 
e ® f is a fine-grained measurement on the joint system 
AB. Thus by the subadditivity of the Shannon entropy 



H(A)+H(B) 



= H(e(A))+H(f(S)) 

> H((e®f)AB) 

> H(AB) , 



(B13) 



as claimed. Now suppose that the infimum for one or 
both of H(A) or H(B) is not achieved. Then for all suf- 
ficiently small S > we can find e £ £* A and f £ £* B such 
that 



Here we prove the other general properties of the en- 
tropy H. 

Positivity: This follows trivially from the Positivity of 
the Shannon Entropy. 

Boundedness: The existence of a measurement e £ 
£* with d outcomes, combined with the fact that the 
Shannon entropy is maximized for a uniform probability 
distribution, ensure that 



H(A) + R(B) = H(e(A)) + H(f (B)) 



H(5) < H(e(5)) < log(d) 



(BIO) 



which gives Boundedness. 

Concavity: To see that H is concave, suppose first that 
the infimum in the definition ( [l8| of H(5 m j x ) is achieved, 
such that H(S , mix ) = H(e(S , m ; x )) for some e £ £* . As 
effects are linear maps, e(S m i X ) = pe(Si) + (1 — p)e(S < 2). 
Hence, by the concavity of the Shannon entropy 

H(S mix ) = H(e(S mix )) (Bll) 

> pU(e(S 1 )) + (l-p)R(e(S 2 )) 

> pR(Si) + {l-p)K{S 2 ) 



5 > U{AB) - S. 

(B14) 



As this holds for all sufficiently small S > the result 
follows. 

Note that if A and B are in a product state, and the 
theory only allows product measurements on AB then 
equality holds in (22). However given we allow an arbi- 



trary set of joint measurements, equality does not hold 
when A and B are in a product state for any possi- 
ble probabilistic theories (Consider the case in which 
H(A) > log 2, but there exists a fine-grained measure- 
ment on AB with only 2 outcomes). 

Limited Continuity: Here we prove an analogue of the 
Fannes inequality |14j , given an additional reasonable as- 
sumption that we can restrict to measurements with at 
most D outcomes without changing the entropy of a sys- 
tem. 

Suppose without loss of generality that H(S'i) > H(S < 2). 
Initially, we also suppose that the infimum in the defini- 
tion of H(S < 2) is achieved for some f £ £*, such that 
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R(S 2 ) = H(f(5 , 2 )). We can then bound 
|H(Si) - H(5 2 )| < |H(f(5x)) - H(f(5 2 ))| (B15) 
< C (f(5 1 ),f(5 2 ))lo g ( c(f( ^ f(52)) ) 

^ (5l '^ l0g (^)) 

where the first inequality follows from the fact that 
H(5i) < H(f(5i)), the second from Fannes inequality [14] 
applied to the classical case, and the final inequality by 
noting that 

C(f(S 1 ),{{S 2 ))<V(S 1 ,S 2 ) <-, (B16) 

e 

If the infimum is not achieved, then for all sufficiently 
small 5 > there nevertheless exists f G £* such that 
H(S , 2) = H(f(S , 2)) — <5. Following the same procedure as 
before, we find 

to) - H(g a )| < P(g x>i S a ) log ( ^ ) +5 

(B17) 

from which the result follows. 

Appendix C: Properties of the conditional entropy 

1. General case 

We now show that in contrast to the quantity H, our 
second form of conditional entropy H + obeys the intuitive 
property that conditioning reduces entropy in all cases. 

Lemma C.l (Conditioning reduces entropy for H+). 
For any tripartite state ABC G Sabc & n d Us correspond- 
ing reduced states we have 

E+(A) > K+(A\B) > R+(A\BC) . (CI) 

Proof. The first inequality follows by choosing the unit 
measurement in the infimum over Eb in the definition 
of R + (A\B), and noting that E(A) = u(B)U(A\ u ) > 

K(A\B). The second inequality comes from restricting 
to measurements of the formers ® uc in the infimum 
over Ebc m the definition of K + (A\BC). □ 

2. Box world 

We now prove a very restricted form of chain rule in 
box world. This will allow us to show that for our notions 
of entropy the mutual information about any classical in- 
formation given an arbitrary state in box world can never 
increase by more than £ bits when transmitting I bits of 
information. To show our simple chain rule, we will use 



the fact that in box world, we have that when consider- 
ing a composite of a classical system M and an arbitrary 
system B, the only allowed measurements on the com- 
posite system MB take the form of first performing the 
only allowed measurement on M, followed by a choice 
of measurement on B that may depend on the outcome 
of the measurement on M. Since classical systems in 
box world admit exactly one measurement (possibly fol- 
lowed by some classical post-processing), we simply write 
H(M) = H(M) to denote the resulting entropy. 

Lemma C.2 (Box chain rule). For any tripartite state 
CMB G Sqmb i- n box world, where its corresponding 
reduced states where C and M are classical we have 

U+{C\MB) > U + (CM\B) - H(M) . (C2) 

Proof. For simplicity, we only examine the case where the 
infimum is attained in H + , the other case can again be 
obtained by taking the appropriate limit. Since the only 
measurements on MB are as described above, we clearly 
have 

U+(C\MB) = ^e m (M)^/fc(£| ro )H + (C Kfe ) = 

m k 

(C3) 

= J2 e ™( M ) E fk(B ]m )R(C\M = m,K = k) 

m k 

= R(C\M, K) = R(CM\K) - R(M\K) 
> B. + (CM\B) — H(M) , 

where the first equality follows from the definition of H + 
and the fact that M is classical, the second from the 
definition of the conditional Shannon entropy, the third 
from the chain rule for the conditional Shannon entropy, 
and the final inequality from the definition of H + , the 
fact that H(M) = H(e(M)) for classical systems and the 
fact that conditioning reduces entropy for the Shannon 
entropy. □ 

We now see that in consistency with the no-signalling 
principle, the transmition of an I bit message M causes 
the mutual information about a classical system C given 
access to some aribtrary box information B to increase 
by at most £ bits. Note that for our alternate definition 
of conditional entropy and mutual information we have 

T+(C; MB) = H(C) - R+(C\MB) . (C4) 

First, note that we can write 

l+(C:MB) =I+(C;B) +l+(C;M\B) , (C5) 

l + (C;B)=U(C)-U + (C\B) , 

l + (C;M\B) :=U + (C\B)-R + (C\MB) , 

by definition. We hence have 

1 + {C;MB) < R + {C\B) +R + (M) -n + {CM\B) (C6) 

<T+(C;£)+H+(M) <l + (C;B) +£ . 
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Appendix D: Properties of H 

In this section we explore properties of the decompo- 
sition entropy fl. 



1. Reduction to the von Neumann and Shannon 
entropy 

To show the reduction of H(p) to the von Neumann 
entropy S(p) in quantum theory we use the following 
Lemma 

Lemma D.l (Theorem 11.10 in [H]). Suppose p — 
Y^iPiPi) where pi are some set of probabilities and pi are 
density operators. Then 



S(p)<J2PiS(Pi)+tt(Pi)> 



(Dl) 



with equality if and only if the states pi have support on 
orthogonal subspaces. 

Note that when p t are pure states, S(pj) = 0. Hence 
for any pure state decomposition D(p), this implies 



S(p) < H(D(p)) 



(D2) 



Furthermore, denoting an eigendecomposition of p by 
D*(p), it is easy to see that H(D*(p)) = S(p). Hence 
it follows that 



S(p)=H(p)= mf H(D(p)) 



2. Subadditivity and concavity 

In this section we will show that fl is neither concave 
nor subadditive by giving explicit counterexamples from 
box world. 

First consider a single box with binary input /output. 
For clarity, we will represent its state by giving its prob- 
ability distribution P(a\x) in vector form: 



S = 



( P(0|0) \ 
P(0|1) 



(D4) 



Now consider the two states 



Si = 



( 1 ^ 




(l/2\ 







1/2 


1/2 




1 


\l/2>/ 




V J 



(D5) 



These can both be optimally decomposed into two 
equally weighted pure states, e.g. 





(A 






1 





1 





2 


1 


+ 2 











w 



Si 



hence they satisfy H(5i) = H(S I 2) 
now consider the mixed state, 



Smix — -Si + -S*2 — 



(D6) 



log 2 = 1. However 













1/4 


1 





3 


1 


3/4 


~ 4 


1 


+ 4 





\l/4j 




v y 







which has H(S' m i x ) = 
we violate concavity 



(D7) 

H((f j I)) < L Hence in this case 



) < 2 H (^i) 



(D8) 



To obtain a violation of subadditivity we consider a 
bipartite state in which each system has a binary in- 
put/output, represented in the form of a matrix 



Sab = 



( P(00|00) P(01|00) 
P(10|00) P(11|00) 



P(00|10) P(01|10) 

V p(io|io) p(ii|io) 



P(00|01) P(01|01) \ 
P(10|01) P(ll|01) 



P(00|11) P(01|ll) 
P(10|ll) P(lljll) / 

(D9) 



(D3) Choose the following allowed state 



Sai 





(2 


3 


2 


3\ 


1 


3 





3 





8 


5 





2 


3 




\o 


3 


3 


0/ 



A = B 



3 
5 

V3/ 



(D10) 

It is known that in this case there are exactly 24 pure 
states for the bipartite binary input/output case (16 
product states and 8 entangled states) [3], which we de- 
note by S\ B . By demanding that Sab ~ PiS\ B be a 
positive matrix for each pure state we find that any de- 
composition must satisfy pi < |Vi. Hence H(^4P) = 
inf D ( p ) H(pi) > 2. In fact we can construct an explicit 
decomposition in terms of an entangled state and three 
product states (all equally weighted), giving H(^4P) = 2. 
The marginal states on the other hand satisfy 



R(A) = H(P) = H 



3 5 



< 1. 



Hence we obtain 

H(AB) > H(A) + H(P) 
in violation of subadditivity. 



(Dll) 



(D12) 
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Appendix E: A simple coding theorem 



We now sketch the proof of Theorem VIII. 1 which is 
straightforward following the steps taken in the quantum 
setting [5U] . 

Consider the pseudo-projective measurement e that 
gives the minimal output entropy for the state Src, which 
we take to exist by assumption. 

At the core of our little coding theorem lies an observa- 
tion about e-typical sequences analogous to the classical 
and quantum setting. Define the set of e-typical out- 
comes when measuring e® n on the state Src 18 ™ € S® n 



T(n,e) := {n,...,r n e TZ y e 



log 



(El) 



e ri (Src) . . . e rji (Src) 



- H(Src) 



< £ 



When n and e are clear from context, we will also use 
the effects 



/iT(Src 



e r -(Src 5 

PGT(rt,e) 
llA '■= U — hx ■ 



(E2) 



(E3) 



Since we assumed that any theory contains arbitrary 
coarse-grainings of measurements, we can consider the 
measurement 



h:={(T,h T ),(A,h A )} , 



(E4) 



which by assumption we can make pseudo-projective. We 
refer to the subspaces given by hr and h a as the typi- 
cal and atypical subspaces respectively. If we observe 
outcome 'T' for the measurement h, we conclude that a 
state lies in the typical subspace associated with the set 
T(n,e). Otherwise, we conclude that the states lies in 
the atypical subspace. 

Note that by assumption we have that e®" is a fine- 
grained measurement. For all states in the typical sub- 
space, only outcomes in the typical set T(n, e) will occur. 
Hence we have that the dimension of the typical subspace 
satisfies dx < \T(n,e)\. 

We are now ready to prove the following theorem: 

Theorem E.l (Typical subspace theorem). Let all 
quantities be defined as above. Fix e > 0, then for any 
S > and sufficiently large n, 



(i) h T {Sr^ n ) >l-5 



(E5) 



(it) (1 - £)2"( a ( Src )- £ ) < \T(n,e)\ < 2"W 5rc )+ £ ) . 

(E6) 

Proof. The proof of (i) and (ii) is analogous to [2U The- 
orem 12.5] by noting that 

h T (Src® n ) = ^2 e ri (Src)e r2 (Src) . . . e rn (Src) , 

(n,...,r„)eT(n,e) 

(E7) 



and that the condition characterizing the set T(n, e) of 
e-typical sequences can also be written as 



2 -«(H(Src)+e) < ^ ( Src ) g ^ ^ < 2 -n(H 



(Src)-e) 



(E8) 



Given the statement about typical sequences, we can 
now complete the proof of Theorem |VIII.1| Recall that 
the source emits a sequence of states with probability 
qj:. To compress the state we perform a pseudo-projective 
measurement of h given by ( E4 1 . If we obtain outcome 



'T' (corresponding to the typical subspace) we output 
the post-measurement state T[<%], which must lie in the 
typical subspace as the measurement is repeatable. Oth- 
erwise, we prepare an arbitrary fixed state in the typical 
subspace which we will call Sf a u. The resulting state is 
thus a mixed state in the typical subspace of the form 



hT(S s )TlSd + h A (S { )S.. 



fail 



(E9) 



Note that condition (ii) of the theorem tells us that the 
dimension of the typical subspace is at most 2 n ( H ( Src )+ £ ). 
For any R > H(Src), we can therefore find an e such that 
we achieve a compression of rate R. 

To decompress, we will do nothing and simply output 



(E10) 



and so all that remains is to show that Sj: is in fact close 
to the original state S^. Suppose for simplicity that the 
maximum is attained when computing the distance, and 
let e denote the optimal measurement. That is 



Wc then have 



su P C(f(S fc -),f(^)) = C(e(%),e(%)), 
f 



(Ell) 



(E12) 



T>(S s ,S s )=C(e(S i )MS S )) 

<M%)C(e(T[%]),e(,%))+ 

M^)C(e(S fail ),e(^)) 
<h T (S i )T>(T[S i ],S s )+ 

< chAiS^Y + h A (Sj:) , 

< ( c + i)M%) £ 



where the first inequality follows from the properties of 
the classical trace distance and the linearity of effects, the 
second from the definition of distance, and the third from 
the weak disturbance property of a pseudo-projective 
measurement, where c > and e € (0, 1] are constants 
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given by a particular theory. We then note that The inequality in the last line follows from the typical 

subspace theorem. As S can be chosen to be arbitrarily 

(\ small, this concludes our proof. 

E^(%>%)H (E13) 
t / □ 



< 



5>(c + i)«M%) 



= (( C +l)U A (Src® n )) £ 
< (c+l)<5 £ 



